Antipattern: Reference Non-grouped Columns

Let’s explore how misconception about the GROUP BY clause causes an antipattern by testing out different queries.

We'll cover the following

The single-value rule
Do-what-I-mean queries

The root cause of this antipattern is simple, and it reveals a common misconception that many programmers have about how grouping queries work in SQL.

The single-value rule#

The rows in each group consist only of items that have the same value in the column or columns we name after GROUP BY. For example, in the following query, there is a one-row group for each distinct value in product_id.

Enter to Rename, Shift+Enter to Preview

Retrieving the latest bug reported for each product

Every column in the select list of a query must have a single-value row per row group. This is called the Single-Value Rule. Columns named in the GROUP BY clause are guaranteed to be exactly one value per group, no matter how many rows the group matches.

The MAX() expression is also guaranteed to result in a single value for each group: the highest value found in the argument of MAX() from among all the rows in the group.

However, the database server can’t be as certain about any other column named in the select list. It can’t always guarantee that the same value occurs in every row in a group for the other columns.

Enter to Rename, Shift+Enter to Preview

Retrieving the latest bug reported for each product along with the ID

In this example, there are many distinct values for bug_id for a given product_id because the BugsProducts table associates multiple bugs to a given product. There’s no way to represent all the values of bug_id in a grouping query that reduces it to a single row per product.

Since there is no guarantee of a single value per group in the “extra” columns, the database assumes that they violate the Single-Value Rule. Most brands of databases report an error if we run a query that tries to return a column other than the columns named either in the GROUP BY clause or as arguments to aggregate functions.

MySQL and SQLite have different behavior from other brands of the database, which we have studied in the Legitimate uses of the antipattern section.

Do-what-I-mean queries#

The common misconception that programmers have is that SQL can guess which bug_id we want in the report because MAX() is used in another column. Most people assume that if the query fetches the greatest value, then other columns will naturally take their value from the same row where the greatest value occurs.

Unfortunately, SQL can’t make this inference in several cases:

If two bugs have the exact same value for date_reported and that is the greatest value in the group, which value of bug_id should the query report?
If we query for two different aggregate functions — for example, MAX() and MIN() — these probably correspond to two different rows in the group. Which bug_id should the query return for this group?

Enter to Rename, Shift+Enter to Preview

Retrieving records using two aggregate functions, where bug_id corresponds to two different rows

If none of the rows in the table matches the value returned by the aggregate function, what is the value of bug_id? This commonly occurs for the functions AVG(), COUNT(), and SUM().

Enter to Rename, Shift+Enter to Preview

Trying to retrieve records using aggregate function along with the bug_id

These are examples of why the Single-Value Rule is important. Not every query that fails to follow this rule would produce an ambiguous result, but many do. It would be clever if the database could tell an ambiguous query from an unambiguous one and produce an error only when the data contains ambiguity. But that would not be good for application reliability. It would mean that the same query might be valid or invalid, depending on the state of data.

Synopsis: Ambiguous Groups

Solution: Use Columns Unambiguously

Mark as Completed

Report an Issue

Introduction

Logical Antipattern - Jaywalking

Logical Antipattern - Naive Trees

Logical Antipattern - ID Required

Logical Antipattern - Keyless Entry

Assessment: Logical Antipatterns - Part 1

Logical Antipattern - Entity-Attribute-Value

Logical Antipattern - Polymorphic Associations

Logical Antipattern - Multicolumn Attributes

Logical Antipattern - Metadata Tribbles

Assessment: Logical Antipatterns - Part 2

Physical Antipattern - Rounding Errors

Physical Antipattern - 31 Flavors

Physical Antipattern - Phantom Files

Physical Antipattern - Index Shotgun

Assessment: Physical Antipatterns

Query Antipattern - Fear of the Unknown

Query Antipattern - Ambiguous Groups

Query Antipattern - Random Selection

Query Antipattern - Poor Man’s Search Engine

Query Antipattern - Spaghetti Query

Query Antipattern - Implicit Columns

Assessment: Query Antipattern

Application Development Antipattern - Readable Passwords

Application Development Antipattern - SQL Injection

Application Development Antipattern - Pseudokey Neat-Freak

Application Development Antipattern - See No Evil

Application Development Antipattern - Diplomatic Immunity

Application Development Antipattern - Magic Beans

Assessment: Application Development Antipattern

Conclusion

Antipattern: Reference Non-grouped Columns

The single-value rule#

Do-what-I-mean queries#